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Abstract.  Optimality  functions  in  nonlinear  programming  conveniently  measure,  in  some  sense, 
the  distance  between  a  candidate  solution  and  a  stationary  point.  They  may  also  provide  guidance 
towards  the  development  of  implementable  algorithms.  In  this  paper,  we  use  an  optimality  func¬ 
tion  to  construct  procedures  for  validation  analysis  in  stochastic  programs  with  nonlinear,  possibly 
nonconvex,  expected  value  functions  as  both  objective  and  constraint  functions.  We  construct  an 
estimator  of  the  optimality  function  and  examine  its  consistency,  bias,  and  asymptotic  distribution. 
The  estimator  leads  to  confidence  intervals  for  the  value  of  the  optimality  function  at  a  candidate 
solution  and,  hence,  provides  a  quantitative  measure  of  solution  quality.  We  also  construct  an 
implementable  algorithm  for  solving  smooth  stochastic  programs  based  on  sample  average  approxi¬ 
mations  and  the  optimality  function  estimator.  Preliminary  numerical  tests  illustrate  the  proposed 
algorithm  and  validation  analysis  procedures. 

Keywords:  Stochastic  programming;  nonlinear  programming;  optimality  conditions;  validation  analysis. 

1  Introduction 

Stochastic  optimization  problems  arise  in  numerous  context  where  decisions  must  be  made  in 
the  presence  of  data  uncertainty;  see  the  books  [13,  9,  20,  17,  37,  34]  and  references  therein  for 
algorithms,  models,  and  applications.  In  this  paper,  we  deal  with  a  class  of  stochastic  optimization 
problems  defined  in  terms  of  expected  values  of  random  functions.  Let  :  IR'^  x  D  ^  IR, 
j  =  0, 1,2, ...,  q,  be  random  functions  defined  on  a  common  probability  space  (D,  F,  V),  with  D  C  IR*^ 
and  F  <Z  2^  being  the  Borel  sigma  algebra.  Moreover,  let  the  expected  value  functions  :  IR”  ^ 
IR  U  {— oo,  oo}  be  defined  by 

f{x)  =  E[F\x,w)]  (1) 

for  all  j  G  Ro  =  {0}  U  q,  with  q  =  {l,2,...,g},  where  E  is  the  expectation  with  respect  to  V.  Below 
we  impose  conditions  that  ensure  finiteness  of  f^{x),  j  G  qo,  for  all  x  G  IR""  of  interest.  Optimization 
problems  involving  such  expected  value  functions  are  generally  challenging  to  solve  due  to  the  need 
for  estimating  expectations  repeatedly  during  the  optimization.  Even  assessing  how  “close”  a  given 
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candidate  point  x  G  is  to  optimality  or  stationarity  may  be  nontrivial;  see  [4]  and  the  review 
below.  We  specifically  consider  the  stochastic  optimization  problem 

P  ■  {/°(x)  I  f{x)  <  0,j  G  q},  (2) 

xeiR 

where  we  assume  that  /■^(•),  j  G  qo,  are  continuously  differentiable,  but  possibly  nonconvex.  Non- 
convex  stochastic  optimization  problems  arise  in  such  diverse  applications  as  estimation  of  mixed 
logit  models  [2],  engineering  design  [29],  and  inventory  control  [39].  We  focus  on  the  assessment 
of  a  candidate  point  x  G  IR”  for  P,  which  we  refer  to  as  validation  analysis,  but  also  consider  the 
generation  of  such  candidate  points  by  an  algorithm. 

Stationary  points  of  P  are  defined  by  the  Karush-Kuhn- Tucker  (KKT)  or  the  Fritz- John  first- 
order  necessary  optimality  conditions;  see  for  example  Propositions  3.3.1  and  3.3.5  in  [7]  or  Theorem 
2.2.4  in  [25].  If  the  evaluation  of  f^{x)  and  Vf^{x),  j  G  qo,  could  be  accomplished  in  finite  (and 
relatively  short)  time,  then  it  would  be  possible  to  verify  whether  x  G  IR”"  is  stationary  or  near¬ 
stationary  by  solving  a  convex  quadratic  program,  see,  e.g.  Theorem  2.2.8  in  [25].  In  the  present 
context,  however,  /■^(•),  j  G  qo,  are  defined  in  terms  of  expectations  and,  as  we  will  see,  Vf^{x), 
J  G  qo,  are  defined  similarly.  Hence,  f^{x)  and  V f^{x),  j  G  qo,  cannot  generally  be  evaluated  in 
finite  time  resulting  in  challenging  validation  analysis  for  P. 

Previous  studies  of  validation  analysis  in  stochastic  programming  often  deal  with  special  cases 
of  P.  In  the  case  of  uncertainty  in  the  objective  function  only,  i.e.,  j  G  q,  are  deterministic 

functions,  [24]  and  [22]  propose  procedures  for  estimating  upper  and  lower  bounds  on  the  optimal 
value  of  P.  Other  efforts  to  compute  bounds  on  the  optimal  value  include  [15,  3];  see  also  the 
tutorial  [4].  These  procedures  are  limited  to  convex  problems  as  they  require  global  minima  of 
sample  average  approximations  constructed  by  replacing  the  expectations  in  P  by  their  sample 
averages,  or  as  they  make  use  of  strong  duality. 

Under  the  same  assumption  of  deterministic  constraints,  [35]  develops  confidence  regions  for 
V f{x)  as  well  as  hypothesis  tests  for  whether  a  point  x  G  IR”  satisfies  the  KKT  conditions;  see 
also  [14].  The  results  in  [35]  can  be  extended  to  constraints  defined  in  terms  of  expectations  [33], 
though  that  result  appears  unpublished.  The  hypothesis  tests  require  that  the  gradients  of  the 
active  constraints  are  linearly  independent,  the  strict  complimentary  condition  holds  at  x,  and  that 
the  inverse  of  an  estimate  of  a  variance-covariance  matrix  is  nonsingular. 

Stochastic  programs  with  chance  constraints  are  outside  the  scope  of  this  paper  (see  for  example 
[21]  and  Chapter  4  of  [34]),  but  relate  to  P  as  they  are  essentially  of  the  same  form  as  P  except  that 
is  deterministic  and  F^{-,uj),  j  G  q,  are  indicator  functions.  For  such  programs  validation 
analysis  may  involve  estimating  the  probability  of  feasibility  for  a  candidate  point  as  well  as  the 
use  of  Lagrangian  relaxation  to  obtain  bounds  on  the  optimal  value;  see  for  example  Section  5.7  in 
[34]  and  references  therein. 

Validation  analysis  for  the  full  problem  P  has  received  less  attention.  On  p.  208  in  [34], 
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a  lower  bound  on  the  optimal  value  of  P  is  proposed  based  on  the  Lagrangian  function.  The 
bound,  however,  may  be  rather  weak  in  the  case  of  a  nonconvex  problem.  Section  5.2  of  [34] 
(see  also  [32,  12,  2])  uses  stochastic  variational  inequalities  to  analyze  optimality  conditions  for  P. 
The  results  include  conditions  for  almost  sure  convergence  of  stationary  points  of  sample  average 
approximations  to  stationary  points  of  P  as  the  sample  size  grows.  Extension  of  such  results  to 
second-order  optimality  conditions  are  found  in  [2] .  A  similar  result  for  the  case  with  a  nonsmooth 
objective  function  and  deterministic  constraints  is  found  in  [39]. 

In  Section  5.2  of  [34],  we  find  that  under  the  linear  independence  constraint  qualification  and 
the  strict  complementarity  condition,  a  stationary  point  of  a  sample  average  approximation  with 
sample  size  N  is  approximately  normally  distributed  with  mean  equal  to  a  stationary  point  of  P  and 
with  standard  deviation  proportional  to  In  [30]  (see  also  [31]),  we  find  a  hypothesis  test  for 

checking  whether  a  candidate  solution  and  a  corresponding  Lagrange  multiplier  vector  satisfy  the 
KKT  conditions  for  the  case  with  both  inequality  and  equalities  defined  in  terms  of  expectations. 
That  paper  also  presents  confidence  intervals  for  the  constraint  functions  at  a  candidate  point. 
While  these  results  are  important,  they  do  not  directly  quantify  the  quality  of  a  candidate  solution 
that  is  not  stationary  for  a  sample  average  approximation.  In  practice,  we  can  usually  only  hope 
for  a  near-stationary  solution  of  P  and  its  sample  average  approximations.  Hence,  it  becomes 
important  to  assess  the  quality  of  such  solutions. 

In  this  paper,  we  develop  procedures  for  validation  analysis  of  a  candidate  point  x  £  IR'^.  Since 
P  may  be  nonconvex,  we  focus  on  first-order  necessary  optimality  conditions  as  validation  analysis 
by  means  of  bounds  on  the  optimal  value  (see  [4])  appears  difficult.  Specifically,  as  in  Section 
2.2.1  of  [25],  we  consider  Fritz-John  conditions  for  P  as  characterized  by  a  nonpositive  optimality 
function  which  vanishes  at  feasible  stationary  points.  Hence,  the  value  of  the  optimality  function 
at  X  gives  a  measure  of  the  quality  of  x.  We  specifically  provide  bounds  on  the  distance  between 
f^{x)  and  the  optimal  value  of  P  in  terms  of  the  value  of  the  optimality  function  at  x. 

The  optimality  function  involves  /■^(•)  and  V/-^(-),  j  £  qo,  and  can  therefore  only  be  esti¬ 
mated.  We  develop  a  strongly  consistent  estimator  for  the  optimality  function  at  x  and  examine 
its  asymptotic  distribution  and  bias.  We  also  develop  procedures  for  estimating  probabilistic  lower 
bounds  on  the  optimality  function  at  x  and  corresponding  confidence  intervals.  Since  the  optimality 
function  is  nonpositive  and  vanishes  at  a  feasible  stationary  point,  such  a  lower  bound  provides  a 
conservative  estimate  of  the  quality  of  x.  The  lower  bounds  may  also  lead  to  a  stopping  criterion 
for  algorithms  for  solving  P.  In  contrast  to  the  hypothesis  tests  in  [35],  which  check  the  KKT 
conditions  and  require  linear  independence  and  strict  complementary  constraint  qualifications,  we 
adopt  the  more  general  Fritz-John  conditions  and  require  no  constraint  qualification. 

The  estimator  of  the  optimality  function  can  be  viewed  as  an  optimality  function  of  a  sample 
average  approximation  of  P  in  the  case  j  £  qo,  are  continuously  differentiable  for  almost  all 

io  £  0,.  We  exploit  this  observation  and  develop  a  convergent  algorithm  for  P  under  this  additional 
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assumption  and  a  constraint  qualification. 

In  Section  2,  we  define  optimality  conditions  for  P  in  terms  of  an  optimality  function  and  show 
how  that  function  measures  the  distance  to  the  optimal  value  of  P.  Section  3  constructs  an  estimator 
for  the  optimality  function,  proves  its  consistency,  and  derives  the  asymptotic  distribution  of  an 
appropriately  scaled  and  shifted  estimator.  Section  4  develops  procedures  for  validation  analysis  by 
means  of  the  estimator  of  the  optimality  function.  Section  5  constructs  consistent  approximations 
and  presents  an  algorithm  for  P.  Section  6  gives  illustrative  numerical  examples. 


2  Optimality  Function 


In  this  section  we  state  optimality  conditions  for  P,  define  an  optimality  function,  and  prove  a  rela¬ 
tionship  between  the  optimality  function  at  a  feasible  point  x  G  IR”"  and  the  distance  between  f^{x) 
and  the  optimal  value  of  P.  We  adopt  the  Fritz-John  first-order  necessary  optimality  conditions; 
see  for  example  Theorem  2.2.4  in  [25].  Before  we  state  the  conditions,  we  give  assumptions  which 
ensure  that  j  G  qo,  are  finite  valued  and  continuously  differentiable.  We  observe  that  since 

■))  J  ^  RO)  are  random  functions,  it  follows  by  definition  that  F^{x,  •),  j  G  qo,  are  measurable 
for  every  x  G  IR”. 


Assumption  1  We  assume  that  for  a  given  set  S  C  IR*^,  the  following  hold  for  any  nonempty 
eompaet  set  X  C  S  and  for  all  j  G  qo.' 

(i)  There  exists  a  measurable  funetion  C  :  id  ^  [0,oo)  sueh  that  E[C{uj)]  <  oo  and  for  all  x  G  X 

and  almost  every  uj  G  il,  \F^{x,uj)\  <  C{uj). 

(ii)  There  exists  a  measurable  funetion  L  :  it  ^  [0,  oo)  sueh  that  E[L{uj)]  <  oo  and 

\F^ {x,(jj)  —  F^ {x  ,uj)\  <  L(ti;)||x  —  x'll  (3) 

for  all  x,x'  G  S  and  almost  every  tv  G  it. 

(iii)  For  every  x  G  X,  F^{-,uj)  is  eontinuously  differentiable  at  x  for  almost  all  uj  G  it. 


Assumption  1  is  rather  weak  and  commonly  made  in  the  literature;  see  for  example  Theorem 
7.52  in  [34].  A  large  number  of  applications  satisfy  Assumption  1  including  many  instances  of  two- 
stage  stochastic  programs  with  recourse  [17],  Conditional  Value-at-Risk  problems  [27],  inventory 
control  problems  [39],  and  engineering  design  problems  [29].  If  Assumption  1  holds  on  an  open 
set  S  and  A  C  5  is  compact,  then  it  follows  from  Theorem  7.52  in  [34]  that  /•^(•),  j  G  qo,  are 
continuously  differentiable  on  X  and  that  Vf^{x)  =  E[VxF^ {x,(jj)],  for  all  x  G  A  and  j  G  qo- 

We  need  the  following  notation.  For  any  vector  v,  we  adopt  the  convention  that  G  IR  denotes 
the  vector’s  j-th  component.  Let 


- 


G  1R'?+^ 


^  =  1,/i^  >  0,j  G  qo 


(4) 


j6qo 
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'0(x)  =  maxjgq /-^(x),  and  '0(a^)+  =  niax{0,  ^'(a:)}. 

The  Fritz-John  first-order  necessary  conditions  for  P  takes  the  following  form;  see,  for  example, 
Theorem  2.2.4  in  [25]. 


Proposition  1  If  x  €  IR”  is  a  local  minimizer  for  P  and  Assumption  1  holds  on  an  open  set 
S  C  IR”  containing  x,  then  there  exists  a  multiplier  vector  fi  G  such  that 


^  P^Vfix)  =  0 

(5) 

ieqo 

'^p^f{x)  =  0. 

(6) 

ieq 

□ 

We  refer  to  a  point  x  G  IR”^  that  satisfies  (5)  and  (6)  for  some  ft  G  as  a  Fritz-John  point.  We 
remark  that  the  Fritz-John  conditions  reduce  to  the  KKT  conditions  when  pP  >  0;  see  [25],  p.  189. 

We  follow  [25],  see  p.  190,  and  express  the  Fritz-John  conditions  by  means  of  a  continuous 
optimality  function  6  :  IR”^  ^  (— oo,  0]  defined  by 


9{x)  =  mm^  |max|  -  V’(x)+ -h  (V/°(x),/i), 


max{/-^(a;)  —  'if{x)+  +  (V/-^(x),  h)}\  -|-  i]' 
jeq  J 


(7) 


We  find  the  following  alternative  expression  for  9(x)  useful;  see  Theorem  2.2.8  in  [25]: 


6l(x)  =  -  mm  {/iV(x)+ -F^^-^[V’(x)+ - /^(x)] -h  I  }. 

M6SU  i.  ,  .  j 


(8) 


ieq 


i6qo 


Let  =  {x  G  IR”  j  'if{x)  <  0}  be  the  feasible  region  of  P.  The  optimality  function  equivalently 
expresses  the  Fritz-John  conditions  in  the  sense  stated  next;  see  Theorem  2.2.8  in  [25]. 


Proposition  2  Suppose  that  x  G  and  Assumption  1  holds  on  an  open  set  S  C  IR”  containing 
X.  Then,  9{x)  =  0  if  and  only  if  there  exists  a  multiplier  vector  p  gTP^  such  that  (5)  and  (6)  hold. 

In  view  of  Proposition  2,  the  closeness  of  9{x)  to  zero  indicates  the  proximity  of  x  to  a  Fritz- 
John  point.  Under  a  convexity  assumption,  9{x)  also  gives  a  bound  on  the  distance  between  f^{x) 
and  the  minimum  value  of  P  as  the  next  result  shows.  We  find  a  similar  result  for  two-stage 
stochastic  program  with  recourse  in  [14]. 


Proposition  3  Suppose  that  X^  is  nonempty,  f^{-),  j  G  qo,  are  twice  continuously  differentiable, 
and  that  there  exist  constants  0<m<l<M<oo  such  that 

m\\x'  —  xJl  <  {x'  —  X,  V^/-^(x)(x'  —  x))  <  M\\x'  —  xj],  (9) 
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for  all  X,  x'  G  IR"'  and  j  G  qo-  Then,  there  exists  a  eonstant  c  <  oo  sueh  that  for  any  x  G  X^, 


e(.)  -  ^  ^  ^  ^ 

m 

where  x  G  is  the  optimal  solution  of  P. 

Proof:  See  Appendix.  □ 

In  view  of  the  above  results,  the  optimality  function  offers  a  convenient  way  of  measuring  the 
quality  of  a  candidate  point.  Moreover,  as  we  see  in  Section  5,  the  optimality  function  also  provides 
guidance  towards  the  construction  of  an  implementable  algorithm  for  P. 

The  computation  of  9{x)  for  a  given  x  G  IR”  requires  the  solution  of  a  convex  quadratic 
program  with  linear  constraints  (see  (8)),  which  can  be  achieved  in  finite  time.  However,  the 
definition  of  6{x)  involves  f^{x)  and  V f^{x),  j  G  qo,  that,  in  general,  cannot  be  computed  in  finite 
time.  Consequently,  we  define  an  estimator  for  6{x)  using  the  sample  average  estimators  for  f^{x) 
and  VP{x),  j  G  qo. 


3  Estimator  of  Optimality  Function 

3.1  Definition  and  Consistency 


Let  ■■■  be  an  infinite  sequence  of  independent  random  vectors  each  with  value  in  H  and 

distributed  as  V.  Let  IN  =  {1, 2, 3, ...}.  We  define  for  any  N  G  IN,  j  G  qo,  and  x  G  IR"^,  the 
estimators  for  f^{x),  V f^{x),  'iffx),  and  'if{x)j^  by 

fNix)  =  —  ^F{x,UJi),  (11) 

i=i 

1  ^ 

=  (12) 

i=l 

f’Nix)  =  maxjgq  flf{x),  and  V'Ar(x)+  =  max{0,  V’Ar(a;)},  respectively.  We  refer  to  [11]  for  an  overview 
of  alternative  approaches  to  estimating  V f^{x).  In  some  situations  it  may  be  possible  to  use  vari¬ 
ance  reduction  techniques  to  define  alternative  estimators  with  smaller  variance  than  those  defined 
above;  see  for  example  Section  5.5  in  [34].  However,  such  estimators  are  beyond  the  scope  of  the 
paper. 

Finally,  we  define  the  estimator  of  9{x)  by 


9n{x) 


A 


mm 

/iGlR" 


max  I  -  'ifN{x)+  +  (V/^(x),/i), 


max{/^(x)  -  ^Pn{x)+  + 

J6q 


(V/^(x),/i)}} 


+ 


(13) 
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As  commonly  done,  we  view  j  G  qo,  ipNix),  ^jjN{x)+,  and  9n{x)  as  random  variables  and 

V  fj^{x),  j  G  qo,  as  random  vectors  defined  on  the  product  space  generated  by  (17,  T,  V)  and  denote 
the  resulting  probability  measure  and  sample  space  by  V  and  17,  respectively;  see  Chapter  7  of  [34] 
for  further  background.  To  emphasize  the  dependence  on  ca  =  (wi,  W2,  •••)  we  occasionally  write 
fj^{x,UJ),  ^]\f{x,cJ),  etc.  Usually,  however,  we  omit  to.  We  observe  that  under  Assumption  1,  for 
all  X  G  X,  j  G  qo,  are  continuously  differentiable  at  x  for  almost  every  w  G  17.  Hence, 

V fj^{x,uJ),  j  G  qo,  and  9n{x,u)  are  defined  for  almost  every  G  17. 

Similar  to  (8),  we  deduce  from  Theorem  2.2.8  of  [25]  the  following  equivalent  and  useful 
expression  for  9n{x)'. 

9n{x)  =  -  min  |/i°V’iv(a;)+  +  ~  +  sl  H  }•  (1^) 

ieq  "ieqo  ^ 

We  observe  that  since  the  objective  function  in  (8)  is  a  function  of  expectations,  standard 
results  about  the  relationship  between  the  optimal  value  of  a  problem  and  those  of  its  sample 
average  approximations  (see  for  example  Chapter  5  in  [34])  are  not  applicable.  In  the  following, 
however,  we  make  use  of  similar  proof  techniques  as  in  Chapter  5  of  [34] . 

The  next  result  proves  that  9i\f{x)  is  a  strongly  consistent  estimator  of  9{x).  This  result 
is  similar  to  classic  results  about  almost  sure  convergence  of  optimal  values  of  sample  average 
approximations  to  the  optimal  value  of  an  original  problem;  see,  e.g.,  [19,  26]. 

Theorem  1  Suppose  that  Assumption  1  holds  on  an  open  set  eontaining  x  G  IR"".  Then,  9n{x) 
9{x),  as  N  ^  oo,  almost  surely. 

Proof:  It  follows  from  Assumption  1  that  for  all  j  G  q,  f^{x)  is  well  defined  and  finite  valued 
and  hence  the  strong  law  of  large  numbers  implies  that  fjq{x)  f^{x),  as  N  ^  oo,  almost  surely. 
Moreover,  Theorem  7.52  in  [34]  gives  that  V f^{x)  V f^{x),  as  N  ^  oo,  almost  surely.  We  define 
for  any  fi  G  the  function  77  :  ^  IR  by 

?7(/i)  =  7r>(x)+  +  ^/i^[7/7(x)+-/^(x)]  +  i||  ^ /i^V/^(x)  (15) 

i6q  j6qo 

and  similarly  we  define  the  function  ryjv  ^  ^  IR  by 

VN{h)=h^i’N{x)+  +  J2l^^\.^N(^)+- +  ■  (16) 

ieq  ieqo 

Since  is  compact,  it  follows  that  sup^g^^o  IvNih)  ~  9{h)  \  ^  0,  as  A"  ^  00,  almost  surely.  Hence, 
for  any  e  >  0,  there  exists  an  Nq  such  that  for  all  N  >  No  and  fi  G  \r]N{h)  ~  ^  almost 

surely.  Since  9{x)  =  —  min^g^^o  r]{fi)  and  9m{x)  =  —  min^gj^o  r]]\f{iJ,),  it  follows  that  \9]\f{x)  —9{x)\  < 
€  for  all  N  >  Nq  almost  surely.  This  completes  the  proof.  □ 
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3.2  Asymptotic  Distribution  of  Estimator 


We  next  examine  the  asymptotic  distribution  of  an  appropriately  shifted  and  scaled  Oj^ix)  for  a 
given  X  G  IR"".  Before  we  state  the  main  result  of  this  section  (Theorem  2),  we  need  to  establish 
some  notation. 

Let  for  any  x  G  IR”, 

S°(x)  =  {/iGS°  6»(x)  =/V'(x)+  +  ^^-^[V’(x)+-/^(x)]  +  |||  ^  },  (17) 

feq  jeqo 

q(x)  =  {j  G  q  I  ip{x)  =  f{x)},  and 

(  q(x)  U  {0}  if  'ip{x)  =  0 

q(x)+  =  <  q(x)  if  V’(a^)  >  0  (18) 

I  {0}  otherwise. 

We  use  v'  to  denote  the  transpose  of  a  vector  v  and  define  the  following  quantities: 

fix)  =  ifix),f{x), ...,  /^(x))',  (19) 

fNix)  =  {fjf{x),f‘^{x), ...,  fUx))',  (20) 

V7(x)  =  (V/°(x)',  Vf\xy, ...,  Vf^ix)')',  (21) 

and 

V7^(x)  ^  (V/O  (x)',  Vfhix)', V/^(x)')'.  (22) 

We  need  the  following  light-tail  assumption  to  ensure  a  central  limit  theorem. 

Assumption  2  We  assume  that  for  a  given  x  G  IR”,  E[F^ {x,uj)‘^]  <  oo  for  all  J  G  q  and 
E[{dF^ {x ,  u) / dx^"^]  <  oo  for  all  j  G  qo  and  i  =  1,2,  ...,n.  □ 

For  any  x  G  IR"",  we  let  Y{x)  denote  the  q+  {q+  l)n-dimensional  normal  random  vector  with 
zero  mean  and  variance-covariance  matrix  V{x),  where  V{x)  is  the  variance-covariance  matrix 
of  the  random  vector  {F^{x,u:),F‘^{x,uj),  ...,  F‘i{x,ijj),Y  xF^{x,ijj)'  xF^{x,Ljo)'  ,  ...,V  xF^{x,ijj)')'  . 
Moreover,  we  define  the  g-dimensional  random  vector  y_i(x)  and  the  n-dimensional  random  vectors 
Yj{x),  j  G  qo,  such  that  Y{x)  =  {Y-i{xy ,Yq{x)' ,Yi{x)' ,  ...,Yq{x)')' . 

We  use  to  denote  convergence  in  distribution.  The  following  vector-valued  central  limit 
theorem  is  well  known;  see,  for  example.  Theorem  29.5  in  [8]. 


We  next  provide  the  asymptotic  distribution  of  a  scaled  and  shifted  6n{x)- 


Theorem  2  Suppose  that  Assumption  2  holds  at  x  G  IR'^  and  that  Assumption  1  is  satisfied  on  an 
open  set  eontaining  x  G  IR"'.  Then, 

N^/\eN{x)-e{x))  ^  -  min  [^^W{x)  +  Y,  T^[W{x)-Yi^{x)\+Y,  piH  ^  f\x) ,Yfix))] 

/^eS0(a;)  ^gq^ 

(24) 

as  N  ^  oo,  where  W{x)  =  maxjg^(a,)g  Y^fix),  with  =  0. 

Proof:  The  proof  is  based  on  the  Delta  Theorem  7.59  in  [34] .  Let  g  :  ]R'J+('?+i)^  ^  t)e  defined 
for  any  C  =  (C-i,  Co>  C(,  •••,  Cq)  e  lR''+(''+^)’",  with  C-i  G  IR*^,  Cj  G  JR"",  j  G  qo,  by 

g(C)  =  -  min  {/u>(C)  +J2^^^[w(C)  -  Cii]  +  ^ll  ^  tiKj  \  (25) 

ieq  "jeqo 

where  w  :  IR'J+l'J+i)’^  ^  IR  is  defined  by  w{C,)  =  max{0,  maxjgq  Since  J2jeqo  ~  ^ 

^  G  Sg,  it  follows  that 

5(C)  =  -^(C)-<(*(C),  (26) 

where  fi  :  ]R'J+('?+i)^  ^  IR  is  defined  by 

fiiC)  =  min  {  -  +  y  J2  hKjll  }•  (27) 

''  ieq  ieqo 

Let 

qt«(C)  =  {i  G  q  I  maxC^i  =  Ci  J,  (28) 

fcGq 

and  _  _ 

f  q«;(Ou{0}  ifu;(C)=0 

qu>(0+  =  s  qu>(C)  if  w{C)  >  o  (29) 

I  {0}  otherwise. 

Moreover,  let 

s^c) = {m  G  sy  fiic)  =  -j2h^cu + y  E  -“"'of }.  (30) 

ieq  ieqo 

It  follows  from  Danskin  Theorem;  see,  for  example.  Theorem  7.21  in  [34],  that  w{-)  and  fi{-) 
are  locally  Lipschitz  continuous  and  directional  differentiable  with  directional  derivatives  at  C  G 
5^g+(g+i)n  direction  ^  e  ]R9+(9+i)«  given  by 

dw(C;~^)  =  max  (31) 

jeq™(C)+ 

with  =  0,  and 

#(C;?)  =  min_  {  -  +  E  ^"(  E  /^^Cfc,G')}-  (32) 

jgq  jgqo  fcgqg 
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Consequently,  g{-)  is  locally  Lipschitz  continuous  and  directional  differentiable  with  directional 
derivatives  at  C  £  IR'J+l'J+i)"-  in  the  direction  ^  G  ]]^'?+(9+i)»^  given  by 

dg(C;0  =  -  max  -  min  (  -  ^  /iV  ^  (33) 

iSqm(C)+  /iSS,^(C)  Jgq  jGqo  fcSqo 

Hence,  it  follows  from  Proposition  7.57  in  [34]  that  g{-)  is  Hadamard  directional  differentiable. 

In  view  of  Proposition  4,  Delta  Theorem  7.59  in  [34]  gives  that 

-  g{{f{x),Vf{x)y))  =>  dg{{f{x),Vj{xyy]  Y (x)).  (34) 

The  result  now  follows  from  the  facts  that  5'((/Ar(x),  V/jv(x)')')  =  0n{x),  ^((/(x),  V/(x)')')  =  0{x), 
^w{if{x),Vf{xyy)+  =  q(x)+,  and  S,ji((/(x),  V/(x)')')  =  Sg(x)  and  from  rearranging  terms.  □ 
In  general,  the  right-hand  side  in  (24)  is  not  a  normal  random  variable.  Hence,  ONix)  cannot 
be  expected  to  be  approximately  normal  even  for  large  N.  In  special  cases,  we  find  the  following 
interesting  corollaries. 


Corollary  1  Suppose  that  Assumption  2  holds  at  x  G  IR'^  and  that  Assumption  1  is  satisfied  on 
an  open  set  eontaining  x  G  IR”.  Then,  the  following  statements  hold: 

(i)  If  the  veetors  VP  (x),  j  G  qo,  are  linearly  independent,  then  Sg(x)  =  {/l(x)}  is  a  singleton  and 

N^/y9N{x)  -  e{x))  (35) 

^  -A°(x)iT(x)  -Y^fiyx)[w{x)-Yy{x)]  -  Y.  fiyx){  Y  h^^fHx),Yyx)), 

jsq  jeqo  fceqo 

as  N  ^  oo. 


(ii)  If  X  is  a  loeal  minimizer  of  P  and  the  veetors  V f^x),  j  G  q(x),  are  linearly  independent,  then 


i;q(x)  =  {fi{x)}  is  a  singleton  and 

N^^‘^9n{x)  ^  — IT(x)  -|-  Yj  y {x)Ylyx) 

ieq(a;)+ 

as  N  ^  oo.  Moreover,  if  in  addition  q(x)  =  {i(x)}  is  a  singleton,  then 


(36) 


^1/20  ^  N  ^  [  -m&xy),YyY'^]  +  fiY^\x)Yyy\x)  T  fY^){x)  =  0 

\  0  ifp(^)(x)<0  ^  ^ 


as  N  ^  oo. 


Proof:  If  the  vectors  V/-^(x),  j  G  qo,  are  linearly  independent,  then  the  matrix  yl(x)  =  (V/®(x), 
V/^(x),  ...,  V/'^(x))  has  rank  q  +  1.  Hence,  H(x)'H(x)  is  positive  definite  and  the  objective  function 
in  (8)  is  strictly  convex.  Consequently,  Sg(x)  is  a  singleton  and  part  (i)  follows  directly. 

Next,  consider  part  (ii).  Since  x  G  IR”  is  a  local  minimizer  of  P,  ^’{x)  <  0  and,  from 
Proposition  2,  9{x)  =  0.  Hence,  it  follows  from  (8)  that  there  exists  a  fi{x)  G  Sq(a;)  such  that 
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Ejeqo  A^(a;)VP(x)  =  0  and  Ejeq  -  P{x)]  =  0.  Consequently,  fipx)  =  0  for  all  j  Gq 

such  that  j  0  q(x)+.  We  deduce  from  the  Karush-Kuhn- Tucker  conditions  for  P  (see  for  example 
Theorem  3.3.1  in  [7])  that  under  the  stated  linear  independence  assumption,  Sg(x)  is  a  singleton. 
Since  T°i(x)  =  0  by  definition,  (24)  reduces  to  (36).  Finally,  (37)  follows  from  (36).  □ 

Corollary  2  Suppose  that  Assumption  2  holds  at  x  G  IR'^  and  that  Assumption  1  holds  on  an  open 
set  eontaining  x  G  IR"^.  If  all  eonstraints  are  deterministie,  i.e.,  Fp-,uj)  =  Fpx),  j  G  q,  then 

N^/^{eN{x)  -  0{x))  ^  -  min  iP/  f’^{x),Yo{x)),  (38) 

as  N  ^  oo. 

Proof:  This  result  follows  by  similar  argument  as  those  leading  to  Theorem  2. 

We  see  from  (38)  that  9n{x)  is  approximately  normal  when  Sg(x)  is  a  singleton 
the  limiting  distribution  degenerates  to  the  constant  zero  when  6{x)  =  0. 

The  next  corollary  deals  with  the  special  case  of  no  constraints. 

Corollary  3  Suppose  that  Assumption  2  holds  at  x  G  IR'^  and  that  Assumption  1  holds  on  an  open 
set  eontaining  x  G  IR”.  If  there  are  no  eonstraints  in  P,  then 

N^/\eN{x)  -  e{x))  ^  AA(0,  Vf{x)%{x)Vf{x)),  (39) 

as  N  ^  oo,  where  Vo(3;)  is  the  n-by-n  varianee-eovarianee  matrix  ofYQ{x)  (and  VxF^{x,ijj))  and 
Af{0,P)  denotes  a  zero-mean  normal  random  variable  with  varianee  P. 

Proof:  This  result  follows  from  Theorem  2.  It  can  also  be  shown  using  Delta  Theorem  7.59  in  [34] 
and  the  fact  (see  p.  6  in  [25])  that  in  this  case  we  obtain  the  simplifications 


0{x)  =  -^\\Vf{x)f 

(40) 

e^{x)  =  -^\\vfNix)f. 

(41) 

□. 

We  next  consider  the  bias  E6j\f{x)  —  6{x),  where  E  denotes  the  expectation  with  respect  to 
V.  Convergence  in  distribution  do  not  necessarily  imply  convergence  of  expectations.  Under  an 
uniform  integrability  property,  however,  the  convergence  of  expectations  is  ensured;  see  for  example 
p.  338  of  [8].  The  property  holds  under  several  assumptions,  one  of  which  is  used  in  the  next  result. 

Proposition  5  Suppose  that  Assumption  2  holds  at  x  G  IR""  and  that  Assumption  1  holds  on  an 
open  set  eontaining  x  G  IR”.  Moreover,  suppose  that  there  exists  an  e  >  0  sueh  that 

sup  E[\N^^‘^{6n{x)  —  6*(x))1^’''^]  <  oo.  (42) 

ATeiN 


□ 

.  Moreover, 
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Then, 


+ 


E6n{x)  —  6{x) 


min  \^^W{x)  +  y^  ^^[W{x) 

M6S0(x)  ^  ^ 


(43) 

Yl,{x)]  +  ^  /V/"(x),y,(x))}' 

j'eqo  fceqo 


Moreover,  i/Sg(x)  is  a  singleton,  then 

EONix)  -  9{x)  =  -N-^/^^W{x)]  +  o{N-^/^). 


(44) 


Proof:  From  Theorem  25.12  in  [8]  and  Theorem  2,  we  directly  obtain  (43).  Since  Tii,  j  G  q,  and 
Yjix),  j  G  qo,  have  zero  mean  and  Z)jgqo  ^  h-  ^  (44)  also  holds.  □ 

Conditions  that  ensure  that  Sg(a:)  is  a  singleton  is  given  in  Corollary  1. 

We  observe  that  the  bias  identified  above  is  similar  to  the  well-known  bias  of  the  optimal  value 
of  min2;gx^  relative  to  the  optimal  value  of  miujjgx^  f^{x)]  see,  for  example  p.  167  in  [34]. 

In  that  case,  the  bias  is  always  nonpositive.  In  the  present  case,  E9m{x)  may  be  larger  than  9{x). 
However,  in  the  absence  of  constraints  in  P,  it  follows  directly  from  (40)  and  (41)  and  Jensen’s 
inequality  that  for  any  G  IN, 

E9n{x)  <  9{x).  (45) 


4  Validation  Analysis 

In  this  section,  we  develop  procedures  for  assessing  the  quality  of  a  candidate  point  x  G  IR”". 
Specifically,  we  develop  confidence  intervals  and  probabilistic  bounds  on  9{x)  and  V'(x).  Using  such 
bounds,  we  may  claim  with  some  confidence  that  x  satisfies  the  conditions  'ip{x)  <  6  and  9(x)  >  —e 
for  a  given  J  >  0  and  e  >  0.  We  first  consider  the  situation  with  no  constraints  in  P,  second  deal 
with  near  feasibility,  and  third  bound  the  optimality  function  of  the  full  problem. 


4.1  Unconstrained  Optimization 


Suppose  that  there  are  no  constraints  in  P  and  let  x  G  IR""  be  a  candidate  solution.  In  view  of 
Corollary  3,  9]\f{x)  is  approximately  normal  with  mean  9{x)  and  variance  Y f  {x)'Vq{x)Y f  {x) / N  for 
large  N .  Hence,  it  is  straightforward  to  construct  a  confidence  interval  for  9{x).  Let 

1  ^ 

Vo,n{x)  =  ^  ^  '^(yxF{x,u;i)  -  fN{x))'{VxF{x,ijJi)  -  fN{x)).  (46) 

i=l 

be  the  standard  unbiased  estimator  of  Vq.  Then  for  large  N, 


(^9n{x)  -  Zc,^VfN{xyVo,N{x)VfN{x)/N,  o' 


(47) 
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is  an  approximate  100(1  —  a)%-confidence  interval  for  9{x),  where  Za  is  the  standard  normal  a- 
quantile.  In  (47)  and  other  confidence  intervals  below  we  use  a  quantile  of  the  standard  normal 
distribution  instead  of  one  of  the  t-distribution  as  the  sample  size  is  typically  relatively  large. 

We  observe  that  the  approximate  normality  of  9n{x)  does  not  directly  reflect  the  fact  that 
9n{x)  <  0  almost  surely.  However,  in  practice,  validation  analysis  is  almost  always  carried  out  at 
an  x  G  IR”"  with  9{x)  <  0  in  which  case  the  truncation  at  zero  is  insignificant  for  large  N .  Our 
numerical  experiments  indicate  that  the  normal  model  of  9]\f{x)  is  quite  accurate  for  both  9{x)  <  0 
and  9{x)  =  0;  see  Section  6.  The  confidence  interval  (47)  is  one-sized,  as  are  the  confidence  intervals 
derived  below.  While  it  is  easy  to  convert  (47)  into  a  two-sided  confidence  interval,  we  believe  that 
one-sided  confidence  intervals  are  more  suitable  in  the  present  context  as  9{x)  >  — e  is  a  natural 
(though  conceptual)  criterion  for  stopping  an  algorithm  applied  to  P.  Hence,  if  (47)  is  contained 
in  (— e,  0],  then  we  would  be  100(1  —  a)%  confident  that  9{x)  >  — e  is  satisfied. 


4.2  Near  Feasibility  in  P 


We  next  consider  the  full  problem  P  and  develop  a  procedure  for  determining  whether  x  G  IR”"  is 
nearly  feasible,  i.e.,  ^p{x)  <  S  for  some  <5  >  0.  We  adopt  a  simple  batching  approach  to  estimate  the 
value  of  ip(x).  In  the  ranking  and  selection  literature  we  find  more  sophisticated  and  potentially 
more  efficient  ways  of  determining  whether  x  is  nearly  feasible;  see  for  example  [18]  and  references 
therein.  It  is  also  possible  to  estimate  f^{x)  independently  for  each  constraints  j  G  q;  see  [30]. 
However,  we  do  not  explore  those  possibilities  further. 

Since  the  function  m  :  IR'^  ^  IR  defined  for  any  y  G  IR'^  by  m{y)  =  max^gq  y^  is  convex,  it 
follows  by  Jensen’s  inequality  that 

ip{x)  <  E'ip]\i{x).  (48) 


We  construct  a  confidence  interval  for  E'ip]s[{x),  which,  in  view  of  (48),  provides  a  conservative 
confidence  interval  for  'ip{x). 

For  given  N  and  M,  let  tpN,k{x),  A:  =  1,  2, ...,  M,  be  independent  random  variables  distributed 
as  'iI)m{x).  Then, 


M 


k=l 


(49) 


is  an  unbiased  estimator  of  £'^Ar(x).  If  E[E^ {x,uj)‘^]  <  oo  for  all  j  G  q,  then  a  central  limit 
theorem  holds  for  '4’n  Mix)i  i-e.,  Mix)  is  approximately  normal  with  mean  E'll^p^^x)  and  variance 
Har['0Ar(a:)]/M  for  large  M.  Let  ui^)  standard  unbiased  estimator  of  Far  [V’Ar(x)]  given 

by 


^  M 

s\,N,Mix)  =  ^  ^  '^{''pN,k{x)  — 

k=l 


(50) 


Then,  it  follows  that 


(— oo,  '4’N,Mix)  +  ZaSip^N,M  ix)/VM) 


(51) 
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is  an  approximate  100(1  —  a)%-confidence  interval  for  Eipjy^x)  for  large  M.  In  view  of  (48),  it 
follows  that  (51)  is  a  conservative  100(1  —  Q;)%-confidence  interval  for  'ijj^x)  for  large  M.  This 
confidence  interval  can  be  used  to  assess  whether  the  candidate  solution  x  satisfies  'il){x)  <  6. 


4.3  Constrained  Optimization 


We  propose  two  approaches  for  obtaining  confidence  intervals  for  6{x).  The  first  approach  makes 
use  of  the  following  result. 

Proposition  6  Consider  x  G  IR"'  and  suppose  that  Assumption  1  holds  on  an  open  set  eontaining 
X  G  IR"'.  Then,  for  any  /U  G  S' 


q’ 


2n 


0{x)>E  -  + ^ 

jGq  ieqo 

Proof:  For  any  /x  G  S^,  let  r)  :  IR'J+l'J+i)"'  ^  jp  defined  by 

i7(C)  =  max{0,maxCiJ  H 

i6q  jGqo 

for  any  C,  =  (C-i;  Co;  Ci)  ■■■Cq)  ^  1R'^“'“^'^“'"^^'^,  with  ((_i  G  IR”^  and  Qj  G  IR"",  j  G  qo-  Since  ??(•)  is  convex, 
it  follows  from  Jensen’s  inequality  that 


(52) 


(53) 


Ef,{{fN{x)',Vfjq{x)')')  >  7?((/(x)',  V/(x) 


(54) 


From  (8)  and  (54),  we  see  that 

V{{f{x)' ,Vj{x)')')  =  ii^'if{x)+  +  ^ii^{i}{x)+  + f^{x))  +  \\^^  H^Vp{x) 

ieq  ieqo 

>  —9{x). 


(55) 

(56) 


The  result  then  follows  from  the  fact  that  Efj{{fi,[{x)' ,  f  Nix)')')  equals  the  negative  of  the  right- 
hand  side  in  (52).  □ 

In  view  of  Proposition  6,  we  construct  a  conservative  confidence  interval  for  6{x)  by  com¬ 
puting  a  confidence  interval  for  the  right-hand  side  in  (52).  We  adopt  a  batching  approach  and, 
for  given  N  and  M,  let  r]N,k,  k  =  1,2,...,M,  be  independent  random  variables  distributed  as 
fHifNixyyj^ixyy).  Then, 


A 

^7V,M  — 


1 

M 


M 

X!  hN,k 
k=l 


(57) 


is  an  unbiased  estimator  of  E[fj{{fN{xy  ,V  f  ^{xy)')].  Under  sufficient  integrability  assumptions  for 
(/7v(a;)^  V/7v(x)'),  a  central  limit  theorem  holds  for  ^  and,  consequently,  Vn  M  approximately 
normal  with  mean  E[fj{{f]\f{xy ,V f iq{xyy)]  and  variance  Var[fj{{fj\f{xy ,V f j^{xyy)]/M  for  large 
M.  Let  s'^  be  the  standard  unbiased  estimator  of  Uar[r)((/Ar(x)',  V/jv(x)')')]  given  by 

^  M 

_i  '^(''lN,k  -  Vn,m)‘^-  (58) 

k=l 
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Then,  it  follows  that 


{~Vn,M  ~  ZaSr],N,M  {x)/Vm,0]  (59) 

is  an  approximate  100(1  —  a)%-confidence  interval  for  iii[— i)((/jv(x)',  V/jv(x)')')]  for  large  M.  In 
view  of  (52),  it  follows  that  (59)  is  a  conservative  100(1  —  Q;)%-confidence  interval  for  9(x)  for  large 
M.  To  compute  the  above  confidence  interval,  it  is  necessary  to  select  a  n  G  In  view  of  the 
proof  of  Proposition  6,  we  see  that  a  tighter  confidence  interval  can  be  expected  when  /x  G  Sg(x). 
Hence,  we  recommend  to  select  /x  as  the  optimal  solution  of  (14)  for  some  large  N.  We  note, 
however,  that  even  when  using  ^  G  Sg(x),  the  inequality  in  (52)  may  be  strict. 

The  second  approach  to  constructing  a  confidence  interval  for  6(x)  is  motivated  by  a  procedure 
for  obtaining  bounds  on  the  optimal  value  of  optimization  problems  with  chance  constraints;  see 
Section  5.7.2  in  [34].  The  approach  is  somewhat  more  limited  than  the  first  approach  as  it  requires 
the  following  independence  assumption. 

Assumption  3  We  assume  that  for  a  given  x  G  IR”,  the  random  veetors  {f^{x),Vf^{xyy,  j  G  q, 
are  statistieally  independent.  Moreover,  we  assume  that  V/°(x)  is  statistieally  independent  of 
{P{x),VP{xyy  for  all  j  G  q.  □ 

Assumption  3  trivially  holds  for  all  x  G  IR”  in  the  important  special  case  when  all  but  one  of  the 
random  functions  Fp-,Lv),  j  G  qo,  are  deterministic. 

It  is  beneficial  to  “decompose”  the  optimality  function  into  feasibility  and  optimality  parts. 
From  (7)  we  see  that  6{x)  =  — V^(x)+  +  u{x),  where 

u(x)  =  min  2;+i||/i|P  (60) 

s.t.  (V/°(x),  h)  <  z 

f{x)  +  {Vf{x),h)  <  z,j  G  q. 

Here,  — x/;(x)+  is  a  measure  of  feasibility  and  xx(x)  is  a  measure  of  optimality.  Similarly,  let 

un{x)  =  min  2:+^||/x|p  (61) 

he]R’",^elR  ^ 

s.t.  {Vf%{x),h)<z 

/^(x)  +  (V/^(x),/x)  <z,jG  q. 

The  next  lemma  provides  a  useful  relationship  between  u(x)  and  um{x). 

Lemma  1  Suppose  that  Assumptions  2  and  3  hold  at  x  G  IR"'  and  that  Assumption  1  holds  on  an 
open  set  eontaining  x  G  IR"'.  Then, 

V[un{x)  <  xx(x)]  >  (62) 
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Proof:  Suppose  that  {h,  z)  G  is  a  feasible  point  in  (60).  We  want  to  determine  the  probability, 

denoted  pN,  that  (h,  z)  is  feasible  in  (61).  Since  {h,z)  G  IR”'"'"^  is  feasible  for  (60),  we  obtain  that 

PN  =  p[{(V/^(x),/i)  <  z}p|  (  Pi  {/j(r(x)  +  (V/^(x),/i)  <  z})] 

jSq 

>  p[{(v/^(x)-v/°(x),h)  <o}n  (63) 

( n  {/^(^)  -  /'(^) + <  o})' . 

J6q 

Under  Assumption  3,  it  follows  that 

PN  >  v[{Vf%{x)  -  Vf{x),  ^)  <  o]  n  -  fix)  +  (V/^(x)  -  Vf{x),  h)<o\.  (64) 

ieq 

In  view  of  Proposition  4,  it  follows  that  f^{x)  —  \7f{x),h)  converges  in  distribution  to  a 

zero-mean  normal  random  variable.  Hence, 

lim  v\{Vf^{x)  -  Vf{x),h)  <  ol  >  1/2.  (65) 

N^oo  L  J 

We  observe  that  the  limit  in  (65)  is  not  equal  to  1/2  as  the  zero- mean  normal  random  variable  may 
have  zero  variance.  Similarly,  for  all  j  G  q,  —  ffx)  +  {V  fj^{x)  —  V ffx),  h))  converges 

in  distribution  to  a  zero-mean  normal  random  variable.  Hence,  for  all  j  G  q, 

hm  V\f^{x)  -  fix)  +  (V//,(x)  -  Vf{x),  /i)  <  ol  >  1/2.  (66) 

A'— >00  L  J 

Consequently,  lim  inf  at^oo  Ptv  >  l/2'^'''^.  Since  this  result  holds  for  any  {h,z)  G  IR'^"'"^  that  is 
feasible  in  (60),  it  also  holds  for  the  optimal  solution  in  (60).  If  {h,z)  G  IR”'"'"^  is  the  optimal 
solution  in  (60)  and  it  is  also  feasible  in  (61),  then 

UNix)  <  z  +  l\\hf  =  u{x).  (67) 

This  completes  the  proof.  □ 

Lemma  1  provides  the  basis  for  the  following  procedure  for  obtaining  a  probabilistic  lower 
bound  on  u{x).  This  procedure  is  essentially  identical  to  the  one  proposed  in  Section  5.7.2  of  [34] 
in  the  context  of  chance  constraints. 

Let  UN,kix),  k  =  1,2,...,  A,  be  independent  random  variables  distributed  as  UNix).  After 
obtaining  realizations  of  these  random  variables,  we  order  them  with  respect  to  their  values.  Let 
un,1iUn,2,  ■■■,un,Ki  with  UN,k  <  UN,k+G  be  this  ordered  sequence.  That  is,  is  the  smallest 
value  of  UN,kix),  k  =  1,2, ...,  K,  un,2  is  the  second  smallest,  etc.  Suppose  that  is  a  lower  bound 
on  'P[uAr(x)  <  m(x)]  and  suppose  that  for  a  given  /?  G  (0, 1),  A  and  L  satisfy 

E  (  ^  (68) 
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Then,  using  the  same  arguments  as  in  5.7.2  of  [34],  we  obtain  that 

'P[un,l  >  u{x)]  <  (3.  (69) 

Hence,  {un,l,0]  is  a  100(1  —  /3)%-confidence  interval  for  u{x).  In  view  of  Lemma  1  and  its  proof, 
we  recommend  a  number  slightly  smaller  than  1/2'?+^  as  an  estimate  of  the  lower  bound  when 
N  is  moderately  large. 

Suppose  that  the  confidence  interval  for  ^(x)  in  (51)  is  computed  independently  of  the  confi¬ 
dence  interval  for  u{x).  Then, 

(^  - rnaxiO,  +  ZaS^,N,Mi^)/'^}  +  ^n,l,  0  (70) 

is  an  approximate  100(1  —  a)(l  —  /3)%-confidence  interval  for  9(x)  for  large  M  and  N.  We  observe 
that  the  first  approach  to  computing  a  confidence  interval  for  6(x)  requires  the  solution  of  only  one 
convex  quadratic  optimization  problem  to  obtain  ^  ^  The  second  approach  requires  K  such 
solutions.  If  L  =  1,  then  K  >  log/3/log(l  —  ^n)-  Hence,  K  is  typically  moderate.  For  example,  if 
(3  =  0.01  and  =  0.49,  then  K  =  7  suffices. 

5  Algorithm  and  Consistent  Approximations 

There  are  numerous  algorithms  for  solving  stochastic  programs  similar  to  P  including  decomposi¬ 
tion  algorithms  in  cases  with  special  structure  (see,  e.g.,  [16,  13]),  stochastic  approximations  (see, 
e.g.,  [10,  6,  20,  23]),  other  versions  of  stochastic  search  (see,  e.g.,  [36]),  and  sample  average  ap¬ 
proximations  (see,  e.g.,  [34]).  A  detail  review  of  these  algorithms  is  beyond  the  scope  the  paper. 
However,  we  note  that  special  structure  may  not  be  present  and  stochastic  approximations  may  be 
problematic  to  apply  to  P  as  that  problem  possibly  involves  constraints  given  by  nonconvex  expect 
value  functions.  In  this  section,  we  use  sample  average  approximations,  the  optimality  function 
0{-),  and  9n{x)  to  construct  an  algorithm  for  P  that  converges  almost  surely  under  a  smoothness 
assumption. 

5.1  Consistent  Approximations 

We  adopt  the  framework  of  sample  average  approximation  and  define  the  sample  average  problem 

■  “in  I  fhi^)  <  0>J  e  q}.  (71) 

xGIR 

It  is  well-known  that  under  suitable  assumptions,  the  optimal  value  and  the  set  of  optimal  solutions 
of  P]\f  converges  in  some  sense  to  the  optimal  value  and  the  set  of  optimal  solutions  of  P,  respec¬ 
tively;  see  for  example  Chapter  5  of  [34] .  While  such  results  provide  guidance  to  the  selection  of  N, 
they  do  not  directly  translate  into  an  implementable  algorithm  for  solving  P.  In  particular,  if  P]\r 
is  nonconvex,  then  a  globally  optimal  solution  of  Pn  may  be  beyond  reach.  In  this  section,  we  show 
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that  Ptv  and  an  associated  optimality  function  are  weakly  consistent  approximations  (see  Section 
3.3  in  [25]  and  description  below)  of  P  and  0(-).  This  result  directly  leads  to  an  implementable 
algorithm  for  P.  Note  that  the  concept  of  consistent  approximations  in  [25]  is  not  directly  related 
to  consistency  of  estimators. 

We  need  the  following  strengthening  of  Assumption  1. 

Assumption  4  We  assume  that  for  a  given  set  S  C  IR”,  the  following  hold  for  all  j  £  qo.' 

(i)  For  almost  all  tv  G  id,  is  eontinuously  differentiable  on  S. 

(ii)  There  exists  nonnegative  valued  measurable  funetion  sueh  that  E[C{uj)]  <  oo  and  for  every 

X  £  S,  \F^{x,uj)\  <  Cfjj)  and  \\VxF^ {x,uj)\\  <  C{tv)  for  almost  all  lo  £  it. 

Assumption  4  is  identical  to  those  made  in  [2]  and  holds  for  example  in  the  context  of  estimation 
of  mixed  logit  models.  Important  models  such  as  two-stage  stochastic  programs  with  recourse 
and  conditional  Value-at-Risk  problems  involve  nonsmooth  functions  F^(-,uj)  (see  for  example  [17] 
and  [27])  and  hence  do  not  satisfy  Assumption  4  (i).  However,  recent  efforts  to  apply  smooth 
approximations  of  F^(-,uj)  appear  promising  [1,  39]  and  facilitate  the  use  of  the  algorithm  below 
also  in  these  nonsmooth  cases. 

If  Assumption  4  holds  on  an  open  set  S,  then  it  follows  from  Theorems  7.44  and  7.48  in  [34] 
that  /■^(•),  j  £  qo,  are  continuously  differentiable  on  S  and  that 

Vf{x)=E[VxF^{x,co)],  (72) 

for  all  x  G  5  and  j  G  qo.  Moreover,  fj^{-,uJ),j  £  qo,  are  continuously  differentiable  for  almost  every 
iJ  £  il.  Hence,  Pn  is  a  smooth  optimization  problem  almost  surely.  Consequently,  the  estimator 
9n{x)  of  6{x)  can  be  viewed  as  an  optimality  function  6j\f  :  IR”  ^  (oo,  0]  for  P/v-  R  follows  trivially 
that  Proposition  2  holds  with  V'(x)  and  6{x)  replaced  by  V’A''(®)  and  6]\f{x),  respectively,  and  f^{x) 
and  V f^{x),  j  G  qo,  replaced  by  fj^{x)  and  V/^(x),  j  G  qo,  respectively,  in  (5)  and  (6).  Similarly, 
Proposition  3  holds  under  suitable  assumptions  with  6{x),  f{x),  and  f{x)  replaced  by  0Ar(x),  /Ar(x), 
and  fN{xN),  where  xn  is  the  optimal  solution  of  Pn-  Hence,  6n{x)  can  be  viewed  as  a  measure  of 
the  distance  between  x  and  a  Fritz- John  point  of  P/v- 

We  adopt  the  definition  of  weakly  consistent  approximations  from  Section  3.3  in  [25]:  The 
elements  of  the  sequence  {{Pn,9n{-)}n=i  weakly  consistent  approximations  of  (P,  0(-))  if  (i) 
Pat  P,  as  a  ^  oo,  almost  surely,  and  (ii)  for  any  x  £  IR”  and  sequence  C  IR”  with 

xn  ^  X,  as  a  ^  oo,  limsupAT^oo  9n{xn)  <  9{x),  almost  surely. 

We  proceed  by  showing  that  {{PN,dN{-)}N=i  indeed  are  weakly  consistent  approximations 
of  (P,  0(-)).  We  first  consider  epiconvergence  of  Pv  to  P,  which  requires  the  following  constraint 
qualification. 
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Assumption  5  We  assume  for  a  given  set  S  C  and  for  almost  every  tJ  G  O  that  the  following 
holds:  For  every  X  G  S  satisfying  <  0,  there  exists  a  sequenee  {xm}^=i  C  S,  with'lfMixNitx)  < 
0,  sueh  that  x^^x,  as  N^oo.  □ 

When  Assumptions  4  holds  on  a  compact  set  S,  then  for  all  j  G  qo,  fj\[ix)  converges  to 
f^{x)  uniformly  on  S  almost  surely;  see  for  example  Theorem  7.48  in  [34].  It  is  well-known  that 
epiconvergence  follows  under  the  stated  assumptions.  This  result  is  given  in  the  next  proposition; 
see  for  example  Theorem  3.3.2  in  [25]  for  a  proof. 

Proposition  7  Suppose  that  Assumptions  4  and  5  hold  on  an  open  and  bounded  set  S  C  K"'  and 

C  S.  Then,  Pn  P,  as  N  ^  oo,  almost  surely.  □ 

We  next  consider  the  requirement  on  the  relationship  between  0Ar(-)  and  0{-),  as  N  ^  oo. 
When  Assumptions  4  holds  on  a  compact  set  S,  then  for  all  j  G  qo,  V/^(x)  converges  to  V f^{x) 
uniformly  on  S  almost  surely;  see  for  example  Theorem  7.48  in  [34].  Hence,  the  next  extension  of 
Theorem  1  follows  by  essentially  the  same  arguments  as  in  that  theorem’s  proof. 

Proposition  8  Suppose  that  Assumption  4  holds  on  an  open  set  S  C  IR""  and  that  X  C  S  is 

eompaet.  Then,  9n{x)  0{x),  as  N  ^  oo,  uniformly  on  X ,  almost  surely.  □ 

In  view  of  Propositions  7  and  8,  the  next  result  follows  directly  from  the  definition  of  consistent 
approximations . 

Theorem  3  Suppose  that  Assumptions  4  and  5  hold  on  an  open  set  S  C  IR"',  that  X  C  S  is  eom¬ 
paet,  and  that  X^  C  X.  Then,  {{Pn-,9n{-)}^=i  are  weakly  eonsistent  approximations  of  {P,6{-)). 

□ 

As  we  see  in  the  next  section,  this  result  directly  leads  to  an  implementable  algorithm  for  P. 

5.2  Algorithm 

We  adapt  Algorithm  Model  3.3.14  in  [25]  to  P.  In  essence,  the  resulting  algorithm  approximately 
solves  the  sequence  of  problems  {Pn}n£Ki  where  /C  is  an  order  set  of  strictly  increasing  positive 
integers  with  infinite  cardinality.  As  N  increases,  the  precision  with  which  Pj^  is  solved  increases 
too.  We  measure  the  precision  of  a  solution  of  P^r  by  means  of  the  optimality  function  9n{')-  When 
a  point  of  sufficient  precision  is  obtained  for  P/v,  then  the  algorithm  starts  solving  Pn',  where  N'  is 
the  next  integer  in  1C  after  N.  We  allow  great  flexibility  in  the  choice  of  optimization  algorithm  for 
approximately  solving  {P/vlTVeiC-  Essentially,  all  convergent  nonlinear  programming  solvers  can  be 
used. 

For  almost  every  uJ  =  {001,002, ...)  G  Cl  and  any  A  G  IN,  let  An  :  fR""  ^  2®*"  be  a  deterministic 
algorithm  map  that  defines  one  iteration  of  a  nonlinear  programming  algorithm  as  applied  to 
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P]\f  with  the  sample  coi,  C02,  ■■■,  ujn-  We  assume  that  the  algorithm  map  satisfies  the  following 
assumption. 

Assumption  6  For  almost  every  uJ  =  {uji,uj2,  ■■■)  G  and  any  N  G  IN,  we  assume  that  every 
aeeumulation  point  x  G  IR"'  of  a  sequenee  {xi}^Q  generated  by  the  algorithm  map  An{-),  i.e., 
Xj+i  G  Aj\f{xi),  satisfies  6Nix)  =  0  and  <  0- 

We  next  state  the  algorithm,  where  we  use  the  notation  k{N)  to  denote  the  smallest  N'  G  1C 
strictly  greater  than  N. 

Algorithm  1  (Solves  P) 

Input.  Function  A  :  IN  ^  (0,oo)  such  that  A(A)  ^  0,  as  A  ^  00;  an  ordered  set  JC  of  strictly 
increasing  positive  integers  with  infinite  cardinality;  parameters  e,  (5  >  0;  Aq  G  1C;  xq  G  IR""; 
and  realizations  uji,uj2,  ■■■  obtained  by  independent  sampling  from  V. 

Step  0.  Set  f  =  0,  Xg  =  xq,  and  A  =  Ag. 

Step  1.  Using  cji,  0)2,  ■■■,  ujn,  compute  G  A]\f{xi). 

Step  2.  If  9]\f{xi+i)  >  — eA(A)  and  fiNixi+i)  <  SA{N),  then  set  x*j^  =  Xj+i  and  replace  A  by 
A:(A). 

Step  3.  Replace  f  by  i  +  1,  and  go  to  Step  1. 

In  Algorithm  1 ,  P/v  and  Pjsf  are  not  independent  for  any  N,N'  G  1C  as  the  sample  is  augmented  and 
not  regenerated.  We  note  that  the  infinite  sequence  of  realizations  can  be  generated  as 

needed.  That  is,  initially,  generate  cui,  a;2,  ...,ujnq,  and  then  augment  the  sequence  as  A  is  increased. 
The  following  convergence  result  for  Algorithm  1  follows  directly  from  Theorem  3.3.15  in  [25]. 

Proposition  9  Suppose  that  Assumptions  4,  5,  and  6  hold  on  a  suffieiently  large  open  subset  of 
IR"'.  Moreover,  suppose  that  Algorithm  1  has  generated  the  sequenees  and  {xij^g  and  they 

are  bounded.  Then,  every  aeeumulation  point  x  of  satisfies  6{x)  =  0  and  'fi{x)  <  0  almost 

surely.  □ 

Algorithm  1  does  not  include  a  stopping  criterion.  One  might  run  Algorithm  1  until  a  prede¬ 
termined  computing  budget  is  exhausted  and  then  carry  out  validation  analysis  on  the  candidate 
points  {xi}  or  a  subset  thereof.  For  example,  validation  analysis  may  include  computing  confidence 
intervals  for  ip{xi)  and  6{xi)  using  (51),  (59),  and/or  (70).  The  confidence  intervals  need  to  be 
computed  using  a  sample  that  is  independent  of  the  one  used  in  Algorithm  1  to  ensure  the  stated 
approximate  coverage  probabilities. 

In  practice,  one  might  also  want  to  carry  out  sequential  validation  analysis.  That  is,  whenever 
a  new  x^  or  Xj  is  generated,  immediately  assess  its  quality.  If  the  quality  is  satisfactory,  then 
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N 

9{x) 

Gonfidence  intervals 
9{xi)  9(x2) 

10^ 

(-0.2897,0] 

(-0.6515,0] 

(-62.09,0] 

(-5774,0] 

10® 

(-0.0427,0] 

(-0.5771,0] 

(-57.98,0] 

(-5747,0] 

10^ 

(-0.0043,0] 

(-0.4617,0] 

(-57.55,0] 

(-5743,0] 

00 

0 

-0.4420 

-57.40 

-5740 

Table  1:  Example  1:  95%-confidence  intervals  for  9{x)  (column  2),  6{xi)  (column  3),  9{x2)  (column 
4),  and  9(x‘i)  (column  5)  using  (47)  with  varying  sample  size  N.  The  last  row  gives  the  true  values 
oi9{x),  9{xi),  9{x2),  and  ^(xs). 

stop  the  algorithm.  Otherwise,  let  the  algorithm  continue.  In  this  case,  the  last  candidate  solution 
generated  by  the  algorithm  is  random.  Hence,  the  confidence  intervals  derived  in  this  paper  may 
not  have  the  stated  approximate  coverage  probabilities  when  applied  to  that  last  solution;  see  [5] 
for  a  similar  situation  in  the  context  of  optimality  gap  estimation. 

6  Numerical  Examples 

In  this  section,  we  present  preliminary  numerical  tests  of  Algorithm  1  and  the  validation  analysis 
procedures  developed  in  Section  4  as  applied  to  four  simple  examples.  We  recognize  that  variance 
reduction  techniques  (see  for  example  Section  5.5  in  [34])  may  reduce  computing  times  in  validation 
analysis  and  optimization,  but  do  not  pursue  that  avenue  in  this  paper.  All  calculations  in  this 
section  are  performed  in  Matlab  7.4  on  a  2.16  GHz  laptop  computer  with  1  GB  of  RAM  and 
Windows  XP. 

6.1  Example  1:  Validation  Analysis  for  Unconstrained  Problem 

We  consider  an  instance  of  P  where  there  are  no  constraint,  n  =  20,  and  F^{-,  •)  is  defined  by 

20 

F°(x,  uj)  =  Y.  (73) 

i=l 

where  =  i,  P  =  21  —  i,  i  =  1,2,. ..,20,  and  to  =  (cu^,  ...,  w^^)'  is  a  vector  of  independent 
and  uniformly  [0, 1]  distributed  random  variables.  In  this  instance,  both  V/®(x)  and  the  unique 
global  minimizer  x  =  (10,9.5,9,8.5,  ...,0.5)'  are  easy  to  compute  explicitly.  However,  we  still  use 
the  validation  analysis  of  Section  4.1  and  compare  the  resulting  confidence  interval  of  9{x)  with  the 
true  value  of  9{x).  We  observe  that  Assumption  1  (and  4)  holds  for  this  problem  instance. 

We  consider  four  candidate  points:  the  optimal  solution  x,  a  near-optimal  point  xi  =  (10.0029, 
9.4866,  9.0071,  8.5162,  7.9931,  7.5086,  7.0125,  6.4841,  5.9856,  5.5057,  4.9960,  4.5069,  4.0082,  3.5071, 
3.0129,  2.5067,  2.0119,  1.4880,  0.9998,  0.4984)'  obtained  by  randomly  perturbing  x,  a  further-away 
point  X2  =  (9.9, 9.4,  8.9, ...,  0.4)',  and  a  relatively  far-away  point  X3  =  (9,  8.5, 8, ...,  —0.5)'. 
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Table  1  gives  95%-confidence  intervals  for  9{x)  (column  2),  9{xi)  (column  3),  9{x2)  (column 
4),  and  9{x^)  (column  5)  using  (47)  with  varying  sample  size  N .  The  last  row  gives  the  true  values 
of  0(x),  0(xi),  9{x2),  and  9{x^).  We  observe  that  the  confidence  intervals  cover  the  true  value  of  the 
optimality  function.  When  the  value  of  the  optimality  function  is  some  distance  from  zero,  a  tight 
confidence  interval  is  obtained  using  a  moderate  sample  size  N .  However,  when  the  optimality 
function  is  close  to  zero,  tightness  can  only  be  obtain  by  using  a  large  sample  size. 

We  also  apply  a  hypothesis  test  based  on  a  Chi-square  statistic  proposed  in  [35].  The  test 
involves  the  null  hypothesis  that  the  current  point  satisfies  the  KKT  conditions  and  the  alternative 
hypothesis  that  they  are  not.  For  x,  we  compute  a  p-value  of  0.20  using  a  sample  size  oi  N  =  10®. 
Hence,  with  a  test  size  of  (for  example)  0.05,  we  are  unable  to  reject  the  null  hypothesis.  For 
xi,  X2,  and  X3,  we  compute  p- values  of  essentially  zero.  Hence,  in  those  cases  we  reject  the  null 
hypothesis  with  high  confidence.  While  these  conclusions  are  reasonable,  they  do  not  directly 
provide  information  about  how  “close”  a  candidate  solution  is  to  a  Fritz-John  point.  In  practice, 
we  are  rarely  able  to  obtain  a  candidate  solution  that  is  a  Fritz-John  point.  Hence,  the  “distance” 
to  such  a  point  becomes  important.  While  [35]  provides  expressions  for  a  confidence  region  for 
V/®(x)  that  can  be  computed  and  compared  with  a  user-defined  region  containing  0  G  IR”",  it 
is  more  natural  and  convenient  to  condense  V/®(x)  into  a  single  number  as  achieved  with  the 
optimality  function.  As  we  see  next  (and  in  Section  4),  the  approach  based  on  the  optimality 
function  also  generalizes  to  constrained  problems  under  mild  assumptions. 

6.2  Example  2:  Validation  Analysis  for  Deterministically  Constrained  Problem 

The  next  problem  instance  generalizes  a  classical  problem  arising  in  search  and  detection  applica¬ 
tions.  Consider  an  area  of  interest  divided  into  n  cells.  A  stationary  target  is  located  in  one  of  the 
cells.  A  priori  information  gives  that  the  probability  that  the  target  is  in  cell  i  is  pi,  i  =  1,2, ...,  n, 
with  X]r=i  K  =  1-  The  goal  is  to  optimally  allocate  T  time  units  of  search  effort  such  that  the  prob¬ 
ability  of  not  detecting  the  target  is  minimized  (see,  e.g.,  p.  5-1  in  [38]).  We  generalize  this  problem 
and  consider  a  random  search  effectiveness  in  cell  i  per  time  unit  and  minimize  the  expected  prob¬ 
ability  of  not  detecting  the  target.  We  let  x  G  IR"",  with  x*  representing  the  number  of  time  units 
allocated  to  cell  i,  and  let  to  =  {uj^ ,10^ ,  ...,10’^)'  be  independent  lognormally  distributed  random 
variables  (with  parameters  =  lOOrr*  and  A*  =  0,  where  rt*  G  (0, 1)  are  given  data  generated  by 
independent  sampling  from  a  uniform  distribution)  representing  the  random  search  effectiveness  in 
cell  i.  Then,  the  expected  probability  of  not  detecting  the  target  is  /°(x)  =  E[F^{x,uj)],  where 
F^{x,(jj)  =  The  decision  variables  are  constrained  by  <T  and  x  >  0, 

where  we  use  T  =  1.  We  consider  n  =  100  cells.  Assumption  1  (and  4)  holds  for  this  problem 
instance. 

We  consider  three  candidate  solutions:  xi  G  IR^^®,  which  is  nearly  optimal,  X2  =  (1/100,  1/100, 
...,  1/100)'g  IR^°°,  and  X3  =  (1/50, 1/50, ...,  1/50)'  G  IR^°°,  which  is  infeasible.  We  verify  using  long 
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Method 

N  M  K 

Confidence  Intervals 

9{xi)  9(x2)  9(x3) 

(59) 

To®  30  ^ 

10^  30  - 

10®  100  - 

(-0.000630,0]  (-0.007837,0]  (-1.048609,0] 

(-0.000050,0]  (-0.007783,0]  (-1.048554,0] 

(-0.000006,0]  (-0.007483,0]  (-1.009602,0] 

(70) 

To®  ^  T” 

10^  -  5 

10®  -  5 

(-0.000464,0]  (-0.007497,0]  (-0.993391,0] 

(-0.000049,0]  (-0.007359,0]  (-0.993278,0] 

(-0.000006,0]  (-0.007365,0]  (-0.993201,0] 

“Exact” 

TTTir^  -0.00736  -0.99318 

Table  2:  Example  2:  95%-confidence  intervals  for  9{xi)  (column  3),  9{x2)  (column  4),  and  9{x^) 
(column  5)  using  (59)  (rows  3-5)  and  (70)  (rows  6-8)  with  varying  sample  size  N  and  replications 
M  and  K.  The  last  row  gives  approximate  values  of  9{xi)  (column  3),  9{x2)  (column  4),  and  9{x^) 
(column  5). 

simulations  {N  =  10®)  that  9{xi)  8  •  10“^,  9{x2)  ~  —0.00736,  and  9{x^)  —0.99318;  see  the  last 

row  of  Table  2. 

We  consider  both  confidence  interval  (59)  and  (70).  To  compute  (59),  we  first  estimate  /x  by 
solving  (14)  using  sample  size  N.  Second,  we  compute  ?77vm  using  that  /x  and  M  replications.  In 
(70),  we  use  L  =  1  which  leads  io  K  =  h  when  (3  =  0.05;  see  (68). 

Table  2  provides  95%-confidence  intervals  for  6*(xi),  9{x2)-,  and  9{x-i)  using  (59)  (rows  3-5)  and 
(70)  (rows  6-8)  with  varying  sample  size  N  and  replications  M  and  K.  It  appears  that  (59)  tends 
to  give  slightly  larger  confidence  intervals  compared  to  (70)  and  that  the  computational  effort  is 
also  greater.  However,  the  use  of  (70)  requires  Assumption  3. 

We  confirm  the  confidence  level  stipulated  by  the  confidence  interval  (70)  by  estimating  cov¬ 
erage  probabilities,  i.e.,  the  probability  that  the  random  confidence  interval  (70)  includes  9{x).  We 
find  that  99%,  98%  and  99%  of  1000  (200  in  the  case  oi  N  =  10^)  independent  replications  of 
(70)  cover  9{xi)  for  N  =  10®,  N  =  10^,  and  N  =  10®,  respectively.  Similar  calculations  for  9{x2) 
and  9{x^)  result  in  coverage  percentages  of  at  least  97%.  All  these  percentages  are  well  above  the 
stipulated  95%. 

We  also  apply  the  hypothesis  test  of  [35]  and  find  a  p- value  of  0.65  for  the  case  with  xi.  Hence, 
we  are  unable  to  reject  the  null  hypothesis  that  xi  is  a  KKT  point  using  any  reasonable  test  size. 
In  the  case  of  X2  and  X3,  the  p- values  are  essentially  zero  and  the  null  hypothesis  is  rejected  even 
with  a  small  test  size.  As  discussed  in  Section  6.1,  we  believe  that  results  of  the  kind  presented  in 
Table  2  are  more  informative  than  such  hypothesis  tests. 

6.3  Example  3:  Validation  Analysis  for  Problem  with  Expectation  Constraint 

We  next  consider  an  engineering  design  problem  where  the  cost  of  a  short  structural  column  needs 
to  be  minimized  subject  to  constraints  on  the  failure  probability  and  the  aspect  ratio;  see  [28]. 
The  design  variables  are  the  width  x^  and  depth  of  the  column.  In  [29],  we  find  that  the  failure 
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Confidence  Intervals 

Method 

N 

M 

K 

d{xi) 

9{x2) 

9(x3) 

10-^ 

30 

- 

(-0.0554,0] 

(-0.7856,0] 

(-10.0301,0] 

(59) 

10^ 

30 

- 

(-0.0074,0] 

(-0.8179,0] 

(-10.1692,0] 

10® 

100 

- 

(-0.0014,0] 

(-0.7816,0] 

(-9.8631,0] 

10-^ 

30 

5 

(-0.0595,0] 

(-0.8129,0] 

(-10.6630,0] 

(70) 

10^ 

30 

5 

(-0.0031,0] 

(-0.8229,0] 

(-10.1777,0] 

10® 

30 

5 

(-0.0003,0] 

(-0.8137,0] 

(-10.3143,0] 

Table  3:  Example  3:  90%-confidence  intervals  for  9{xi)^  9{x2),  and  6{x3)  using  (59)  (rows  3-5)  and 
(70)  (rows  6-8)  with  varying  sample  size  N  and  replications  M  and  K. 

probability  for  design  x  =  (x^,  x^)  can  be  approximated  with  high-precision  by  the  expression  E[1  — 
xlir'^ix,  cu))],  where  cu  is  a  four-dimensional  standard  normal  random  vector  modeling  random  loads 
and  material  property,  xH')  is  the  cumulative  distribution  function  of  a  Chi-squared  distributed 
random  variable  with  four  degrees  of  freedom,  and  r(x,a;)  is  the  minimum  distance  from  0  G  IR^ 
to  a  limit-state  surface  describing  the  performance  of  the  column  given  design  x  and  realization 
uj]  see  [28,  29].  The  failure  probability  is  constrained  to  be  no  greater  than  0.00135.  Hence, 
we  set  /^(x)  =  E[1  —  X4(?’^(a;, x>))]/0. 00135  —  1.  As  in  [28],  we  adopt  the  objective  function 
/°(x)  =  x^x^  and  the  additional  constraints  /^(x)  =  — x^,  /^(x)  =  — x^,  /^(x)  =  x^/x^  —  2,  and 
/^(x)  =  0.5  —  x^/x^.  In  view  of  results  in  [29],  Assumption  1  holds  for  this  problem  instance. 

We  consider  three  candidate  designs:  xi  =  (0.334,0.586)'  is  the  best  point  reported  in  [28]; 
X2  =  (0.346,0.553)'  is  an  infeasible  solution  reported  in  [28],  and  X3  =  (0.586,0.334)'  is  the  “mirror 
image”  of  xi.  Table  3  presents  similar  confidence  intervals  as  in  Table  2,  but  with  a  =  0.1  in 
(59)  and  a  =  f3  =  0.05  in  (70).  The  methods  give  comparable  results.  As  observed  earlier,  a  near 
optimal  solution  may  require  a  relatively  large  sample  size  to  ensure  a  tight  confidence  interval. 

6.4  Example  4:  Optimization  and  Validation  Analysis  for  Full  Problem 

We  next  illustrate  Algorithm  1  by  considering  an  extension  of  Example  1.  Let  E*^(-,  •)  be  as  defined 
in  that  example  (see  (73))  and  also  define  and  E^(-,-)  similarly,  but  with  a'  and  6*  being 

randomly  and  independently  generated  from  a  uniform  distribution  supported  on  [0, 10]  and  [0,  2], 
respectively.  Moreover,  we  subtract  100  from  these  expression  to  construct  constraints  of  the  form 
-®Ei=i  a*(®*  “  —  100]  <  0.  Hence,  the  resulting  instance  of  P  involves  20  decision  variables, 

60  independent  random  variables  with  uniform  distribution  each  supported  on  [0, 1] ,  an  expected 
value  objective  function,  and  two  expected  value  constraint  functions.  Assumption  4  holds  for  this 
problem  instance. 

We  apply  Algorithm  1  to  this  problem  instance  using  xq  =  0  G  IR^^,  Nq  =  100,  A(A)  =  I/'/N, 
and  e  =  6  =  1.  Moreover,  we  let  k{N)  =  2N.  The  algorithm  map  is  one  iteration  the  Polak- 

He  Phase  1-Phase  2  algorithm;  see  Section  2.6  in  [25].  We  refer  to  the  iterations  of  Algorithm  1  with 
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Candidate 

Point 

N 

Confidence  Intervals 
i;{x%)  9{x%) 

Xg 

100 

- 

(-00,-48.1472) 

(-431.1261,0] 

(5296,  5447) 

/V.* 

*^100 

100 

302 

(-00,-2.0657) 

(-8.9403,0] 

(3411,  3533) 

2^200 

200 

106 

(-00,-0.4903) 

(-3.5880,0] 

(3439,  3521) 

2^400 

400 

104 

(-00,0.5280) 

(-2.0762,0] 

(3419,  3477) 

■^800 

800 

149 

(-00,0.0672) 

(-1.4028,0] 

(3458,  3498) 

2^1600 

1600 

66 

(-00, -0.0001) 

(-0.7915,0] 

(3453,  3482) 

/y.* 

*^3200 

3200 

60 

(-00, -0.0107) 

(-0.4043,0] 

(3462,  3482) 

2^6400 

6400 

75 

(-00,0.0785) 

(-0.2027,0] 

(3466,  3481) 

2^12800 

12800 

129 

(-00,0.0125) 

(-0.1082,0] 

(3470,  3480) 

2^25600 

25600 

79 

(-00,  0.0607) 

(-0.1085,0] 

(3467,  3474) 

2^51200 

51200 

99 

(-00,0.0499) 

(-0.0609,0] 

(3467,  3472) 

Table  4:  Example  4:  95%-confidence  intervals  for  (column  4)  and  (column  6),  and 

90%-confidence  intervals  for  9{x*jq)  (column  5)  for  different  candidate  points  generated  by  Algorithm 

1. 

the  same  sample  size  N  as  a  stage.  We  run  Algorithm  1  for  ten  stages  and  generate  the  candidate 
points  Xg,  x^QQ,  X200,.")  2^51200-  each  candidate  point  x^,  we  compute  the  confidence  intervals 
(51)  and  (70)  using  sample  size  lOA^  (1000  for  Xg),  replications  M  =  30  and  K  =  23,  and  L  =  1. 
This  selection  of  M,  K,  and  L  results  in  95%  confidence  intervals  for  'ip{x*j^)  and  90%-confidence 
intervals  for  0(x^). 

Table  4  presents  the  confidence  intervals  for  the  candidate  points.  Columns  2  and  3  give 
the  sample  size  and  number  of  iterations  used  in  each  stage,  respectively.  Columns  4  and  5  give 
95%  confidence  intervals  for  V’(a^^)  and  90%  confidence  intervals  for  0(x^),  respectively.  We  also 
compute  95%  confidence  intervals  for  /°(xj^)  using  the  standard  sample  average  estimators;  see 
column  6.  We  see  that  x*j^  tends  to  become  closer,  as  measured  by  6{-),  to  a  Fritz-John  point  as  the 
calculations  progress.  The  ten  stages  required  6900  seconds  of  run  time.  The  verification  analysis 
needed  3300  seconds. 

7  Conclusions 

We  have  developed  validation  analysis  procedures  for  nonlinear,  possibly  nonconvex,  stochastic 
programs  with  expected  value  functions  as  both  objective  and  constraint  functions.  The  validation 
analysis  assesses  the  quality  of  a  candidate  solution  x  G  IR*^  by  its  proximity  to  a  Fritz-John 
stationary  point  as  measured  by  the  value  of  an  optimality  function  at  x.  We  construct  an  estimator 
of  the  optimality  function  and  examine  its  consistency,  bias,  and  asymptotic  distribution.  The 
estimator  leads  to  confidence  intervals  for  the  value  of  the  optimality  function  at  x  and,  hence, 
confidence  intervals  for  the  “quality”  of  x.  We  also  construct  an  implementable  algorithm  for 
solving  smooth  stochastic  programs  based  on  sample  average  approximations  and  a  gradual  increase 
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in  sample  size. 
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Appendix 

Proof  of  Proposition  3  Let  x  G  X^.  We  only  consider  x  ^  x  since  the  result  trivially  holds  when 
X  =  X.  We  define  •)  :  fR”  ^  IR  for  any  x'  G  fR”  by  •^{x,x')  =  max{/°(x')  —  /°(x),  V’Ca^OI-  It 
follows  by  the  mean  value  theorem  and  (9)  that  for  any  x'  G  IR*^  and  some  G  [0, 1],  j  G  qo, 

'4){x,x')  =  Taax^{f^{x),x' —  x)  +  ^{x' —  x,V'^f^{x  +  s^{x' —  x)){x  —  x)), 

max{/I(x)  +  (/t(x),  x'  —  x)  +  i(x'  —  x,  V^/-^(x  +  s^{x'  —  x)){x'  —  x))}| 
i6q  ^  J 

<  ^max{(/°(x),M(x' -x))  +  t||M(x' -x)f ,  (74) 

max{/t(x)  +  (/t(x),M(x'  —  x))  +  i||M(x'  —  x)|p}|. 
i6q  ^  J 

Minimizing  first  the  right-hand  and  then  the  left-hand  side  in  (74)  and  using  the  fact  that  ^{x)^  =  0, 
we  obtain  that 

min  '0(x,x')  <  6{x)/M.  (75) 

x'eiR" 

Using  similar  arguments,  we  also  obtain  that 

min  ip{x,x)  >9{x)/m.  (76) 

x'eiR" 

Let  x'  G  IR'^  be  the  unique  optimal  solution  of  min^-zgiRn  ^(x,  x').  Since  ^(x,x)  =  0,  it  follows 
that  x'  G  X^.  From  (75),  we  obtain  that 

fix)  -  fix)  =  ^min^{/°(x')  -  fix)  \  V’(x')  <  0} 

<  min  {il>ix,x')  \  'ij^ix)  <  0} 
x'eiR" 

=  '0(x,X^) 

<  0(x)/M, 

which  proves  the  right-most  inequality  in  (10).  We  next  prove  the  left-most  inequality  and  consider 
three  cases. 

(i)  Suppose  that  V’(®0  <  ^(x,x')  and  fix')  —  fix)  =  '0(x,x').  Then, 

^min^V;(x,x  )  =  ^min^{/°(x')  -  fix)  \  V’(x')  <  0}  =  fix)  -  fix).  (77) 
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Hence,  by(76),  9{x)/m  <  /°(x)  —  /°(x). 

(ii)  Suppose  that  ip{x')  =  'ip{x,x')  and  f^{x')  —  f^{x)  =  We  define  h  =  x  —  x' .  Since 

x'  is  the  unconstrained  minimizer  oi  ^{x,  ■),  it  follows  that  the  directional  derivative  of  •)  at 
x'  is  nonnegative  in  all  directions,  i.e., 

d'4){x,x;h)  =  m.ax{{V f^{x),h),  d'ip{x  ,h)}  >  0,  (78) 

for  all  h  £  IR”.  By  strict  convexity  of 

{Vf\x'),h)  <  ifix)  -  fix))  -  {fix')  -  fix))  <  0.  (79) 


Consequently, 


Now,  let  j'  £  q(x')  be  such  that  diliix';  h) 

(9)  , 


difiix  ,  h)  >  0.  (80) 

=  iV  f^'  ix'),  h).  Then,  by  the  mean  value  theorem  and 


fix)  >  fix')  +  {Vp'ix'),  h)  +  \m\\hf.  (81) 

Hence,  using  (80)  and  (76),  we  obtain 

i^ix)  >  ffx)  >  il^ix)  +  dil^ix  f)  +  \m\\h\\^  (82) 

>  9ix)/m  + 


Since  V'(^)  <  0,  we  find  that 

ii/iii  <  — (83) 

m  ^ 

In  view  of  (9),  is  compact.  Hence,  there  exists  a  constant  c  <  oo  such  that  || V/*^(a:') ||  <  c/4 
for  all  x'  £  X^.  It  now  follows  from  (79)  and  (76)  that 

fix) -fix)  >  fix')- fix) +  iV  fix'), h) 

>  9ix)/m-\\Vfix')\\\\h\\ 
y  -  cf-9ix) 

“  m 

(hi)  Suppose  that  ipix')  =  ^(x,x')  and  fix')  —  fix)  <  ^(x,x').  Then,  due  to  the  optimality 
of  x'  for  'ipix,  •),  dipix' ,x'  —  x')  >0  for  all  x'  £  IR"".  Using  similar  arguments  as  in  (82),  we  obtain 
that  for  any  x'  £  X^, 


and 


0  >  iiix') 


>  ipix)  +  dipix']  x  —  x')  +  ^m\\x'  —  x'W^ 

>  9ix) /m  +  \m\\x' —  x'W^ 


(85) 


\x 


(86) 
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Hence 


X  —  x\\  <  llx 


x'W  +  llx 


It  now  follows  from  convexity  of  /^(•)  and  (76) 


/°(x)  -  f{x)  > 
> 
> 


that 

(V/°(x),x  -  x) 
-||V/°(x)||||x-x 


The  left-most  inequality  (10)  now  follows  as  a  consequence  of  these  three  cases. 


(87) 


(88) 

□ 
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